The Pruning Power: Theory and Heuristics for Mining Databases with Multiple k-Nearest-Neighbor Queries

نویسندگان

  • Christian Böhm
  • Bernhard Braunmüller
  • Hans-Peter Kriegel
چکیده

Numerous data mining algorithms rely heavily on similarity queries. Although many or even all of the performed queries do not depend on each other, the algorithms process them in a sequential way. Recently, a novel technique for efficiently processing multiple similarity queries issued simultaneously has been introduced. It was shown that multiple similarity queries substantially speed-up query intensive data mining applications. For the important case of multiple k-nearest neighbor queries on top of a multidimensional index structure the problem of scheduling directory pages and data pages arises. This aspect has not been addressed so far. In this paper, we derive the theoretic foundation of this scheduling problem. Additionally, we propose several scheduling algorithms based on our theoretical results. In our experimental evaluation, we show that considering the maximum priority of pages clearly outperforms other scheduling approaches.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Search Space Reductions for Nearest-Neighbor Queries

The vast number of applications featuring multimedia and geometric data has made the R-tree a ubiquitous data structure in databases. A popular and fundamental operation on R-trees is nearest neighbor search. While nearest neighbor on R-trees has received considerable experimental attention, it has received somewhat less theoretical consideration. We study pruning heuristics for nearest neighbo...

متن کامل

Optimizing All-Nearest-Neighbor Queries with Trigonometric Pruning

Many applications require to determine the k-nearest neighbors for multiple query points simultaneously. This task is known as all-(k)-nearest-neighbor (AkNN) query. In this paper, we suggest a new method for efficient AkNN query processing which is based on spherical approximations for indexing and query set representation. In this setting, we propose trigonometric pruning which enables a sign...

متن کامل

Efficiently Supporting Multiple Similarity Queries for Mining in Metric Databases

Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest neighbor queries are the most important queries. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typically ...

متن کامل

On the Generalization of Nearest Neighbor Queries

Nearest neighbor queries on R-trees use a number of pruning techniques to improve the search. We examine three common 1-nearest neighbor pruning strategies and generalize them to k-nearest neighbors. This generalization clears up a number of prior misconceptions. Specifically, we show that the generalization of one pruning technique, referred to as strategy 2, is non-trivial and requires the in...

متن کامل

Multiple Similarity Queries: A Basic DBMS Operation for Mining in Metric Databases

Metric databases are databases where a metric distance function is defined for pairs of database objects. In such databases, similarity queries in the form of range queries or k-nearest neighbor queries are the most important query types. In traditional query processing, single queries are issued independently by different users. In many data mining applications, however, the database is typica...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000